Masafumi Okada M.D., Ph.D.
University Hospital Medical Information Network research center, University of Tokyo.
From Define-XML 2.0 Specification document:
The purpose of Define-XML is to support the interchange of dataset metadata for clinical research applications in a machine-readable format.
Typical metadata will be:
| Item | Variable Name | Variable Type | Length | Terminology |
|---|---|---|---|---|
| Age | AGE | Integer | 3 | N/A |
| Start Date/Time of Visit | SVSTDTC | Character | 25 | ISO8601 |
| Vitamin B12 | VITB12 | Integer | 3 | N/A |
But, actually many users of the Define-XML creates that after the dataset is finalized.
In such case, there are two set of metadata for your dataset. One was defined (even roughly) in the protocol document, another is defined in Define-XML.
These two are always consistent?
But I do not know any EDC or CDMS system that supports validating dataset with Define-XML…
This is the reason why I write a small software to validate the data with the metadata define in the Define-XML, using R. The name of my software is Define2Validate.
<!-- ORIGINAL Define-XML -->
<ItemDef OID="IT.LB.LBSEQ" Name="LBSEQ" DataType="integer" Length="2"
SASFieldName="LBSEQ">
<Description>
<TranslatedText xml:lang="en">Sequence Number</TranslatedText>
</Description>
<def:Origin Type="Derived"/>
</ItemDef># Generated YAML Rule
-
expr: 'nchar(as.character(LBSEQ)) <= 2'
name: Length of LBSEQ
-
expr: '!is.na(LBSEQ)'
name: LBSEQ is mandatory
-
expr: 'regexpr("^[0-9-]+$",as.character(LBSEQ)) == 1'
name: LBSEQ should be integerDefine2Validate is implemented as a function of R language for statistical computing.
Domain <- "LB"
define2validate(domain=Domain,file="exampleRules.yaml", definexml="Odm_Define.xml", overwrite=TRUE)## Read Rules file by validation package
v <- validator(.file="exampleRules.yaml") ## Read your Dataset-XML by R4DSXML package.
x <- read.dataset.xml(paste("Odm_", Domain, ".xml", sep=""), "Odm_Define.xml")
## Read Controlled Terminology Definitions by R4DSXML package.
CT <- getCT("Odm_Define.xml")
## Run Validation
cf <- confront(x,v)## Display Results
head(summary(cf))## rule
## 1 Length of LBBLFL
## 2 VISITDY should be integer
## 3 Length of BILI(LBORRES)
## 4 BILI(LBORRES) should be float
## 5 VITB12(LBORRES) should be integer
## items passes fails nNA error warning
## 1 6 3 0 3 FALSE FALSE
## 2 6 6 0 0 FALSE FALSE
## 3 6 5 1 0 FALSE FALSE
## 4 6 6 0 0 FALSE FALSE
## 5 6 6 0 0 FALSE FALSE
## expression
## 1 nchar(as.character(LBBLFL)) <= 1
## 2 regexpr("^[0-9-]+$", as.character(VISITDY)) == 1
## 3 (!(LBTESTCD == "BILI" & LBCAT == "CHEMISTRY" & LBSPEC == "BLOOD")) | nchar(as.character(LBORRES)) <= 3
## 4 (!(LBTESTCD == "BILI" & LBCAT == "CHEMISTRY" & LBSPEC == "BLOOD")) | (regexpr("^[0-9.+-eE]+$", as.character(LBORRES)) == 1) & !is.na(as.numeric(LBORRES))
## 5 (!(LBTESTCD == "VITB12" & LBCAT == "CHEMISTRY" & LBSPEC == "SERUM")) | regexpr("^[0-9-]+$", as.character(LBORRES)) == 1
barplot(cf)